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Ensemble learning, which involves combining the opinions of multiple 
experts to arrive at a better result, has been used for centuries. In this work, a 
review of the major voting methods in ensemble learning is explored. This 
work will focus on a new method for combining the results of individual 
learners. The method depends on the relative accuracy and diversity of 
teams. Instead of trying to assign weight to each different trainer, the 
concept of diversity teams is presented. Each team will vote as one player; 
however, the individual accuracies of each learner still be implemented. The 
concept of relaxing parameters that deal with each team as one player is 
presented. Our experiments demonstrate that traditional ensemble voting 
methods outperform individual learners. There is a limit to the superiority of 
the ensemble learner that any ensemble learner cannot go beyond. The 
proposed voting method gives the same results as the traditional ensemble 


Weighted voting voting methods, however, a different diversity of the proposed method from 
the traditional voting method or for different values of the relaxing 
parameter can be achieved. 
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1. INTRODUCTION 

Ensemble learning is defined as “ensemble learning depends on training a set of trainers and then 
using these trainers for implementing new data through taking a weighted vote of the trainer’s results [1]. 
There are many combining methods, however, the most known methods are bagging and boosting [2]. 
Ensemble learning has many approaches and many works looked at considering many aspects. Some works 
focused on the types of trainers, either homogenous [3], [4] or heterogeneous [5], [6]. Some works looked at 
the purpose of using ensemble learning either for classifying [7]-[26], clustering [27]-[34], 
regression [35]-[37], or streaming [38]-[42]. 

Ensemble learning has many applications in almost all fields. In medicine, some works applied 
ensemble methods to predict the disease [43]-[50] or to classify the patients in each disease [50]—[56]. Also, 
ensemble learning is used in medicine for DNA prediction [57] or for DNA imbalanced splice site datasets 
[58]. Ensemble learning has applications in security [59]—[63]. Social media is not an exception, and many 
works depend on ensemble learning to achieve better performance or cover multi-data types for one learner 
[64]-[74]. Ensemble learning is used in commerce [75]-[78] and credit cards [79]-[81]. Image processing is 
a traditional field for machine learning and ensemble learning, for example [82], [83]. Industry [84]-[94], 
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land science and geology [95]-[98], agriculture [99]-[102], weather [103]-[105], transportation [106]-[108], 
and education [109] have a huge amount of research to handle ensemble learning. However, there are a 
tremendous number of works in applying ensemble learning in all fields, but these works depend on 
developing some homogenous or heterogenous methods as individual learners, and when it comes to the 
merging step classical and simple merging methods are used. 

The basic elements in ensemble learning are the diversity of individual learning and the method of 
merging trainers’ results. Combining the individual learners’ results is the core of ensemble learning. This 
work proposes a combining method depending on the dynamic voting method. Many works studied the 
voting methods in ensemble learning. Liu and Truszczynski [110] proposed a method for voting-based 
ensemble learning for partial lexicography. Araz and Spannowsky [111] proposed a combine-and-conquer 
that is based on Bayesian ensemble neural networks. Suchithra and Pai [112] evaluated the performance of 
bagging-based k-nearest. neighbor and proposed a voting rule method. Kim et al [113] proposed a two-stage 
weighting method for voting in ensemble learning. Some works proposed different methods for voting in 
classification problems [114]—[119]. In the related works section details of voting works will be discussed. 
Some applications concentrated on voting to present a solution to some problems [120]—[128]. 

In this work, a proposed voting method considered the methods of combining the results of the 
individual learning methods to get the results for new data examples. Most of the work concentrated on 
applications. In those studies, some learners were presented as learners who could solve a part of the problem 
or can achieve limited accuracy. The ensemble method is applied and the used methods in the process of 
combining either are the known methods or the rough voting. These works avoided the newer proposed 
methods because of the complexity of these methods. The current work presents a light method that can be 
applied for any type of learner and any type of application. The current work belongs to the work that 
searches for how to combine the results of the individual trains to get a global better performance. There are a 
lot of works to study the voting process. 

The main idea of this work is to look at the trainers as a team. In fact, the practical work leads us to 
consider two teams. We will propose a method to define the extreme conflict trainer for each trainer based on 
the concept of maximum conflict, then each member of the resulting pairs will be assigned to one of the sets 
each of them is called a conflict team based on the concept of minimum conflict. We will construct based on 
the prediction vector for each trainer the diversity matrix which expresses the total number of different 
predictions for each pair of trainers. Based on this matrix the concept of conflict pairs and conflict teams are 
defined. 


2. RELATED WORK 

Smith and Martinez [116] tested the strength of ensemble learning to get rid of the outlier examples 
without the need for a filtering process prior to training the individual learners. They trained 9 weak trainers 
on unfiltered data and then developed an ensemble learner based on rough majority voting. The ensemble 
learner outperformed any individual learner regardless of the used filtering method to clean the data before 
the training process. This work proved the importance of ensemble learning and its power to deal with data 
with a lot of noisy or present outlier samples. 

The goal of using ensemble learning is to reduce the variance for the sake of improving the accuracy 
of the whole system. Ensemble learning deals with all addressed problems in machine learning including 
feature selection, error correction, imbalanced classes, losing features, incremental learning, the concept of 
drift from nonstationary distributions, and others. Ensemble learning, despite getting attention in machine 
learning for a few years, is old and it might be parallel to the history of humanity [129]. In our daily lives, we 
apply the basic concept in ensemble learning when we ask some experts to gain a wide insight to be able to 
solve some problems that have multiple aspects. The early ensemble learning methods are bagging, boosting 
and ada-boost, stacked generalization, and mixture of experts [129]. The current work concentrates on 
developing a method to test the diversity of a given learner and then proposes a dynamic method to merge the 
learners’ individual results for predicting new instances of results. 

In the following part, some recent works that discussed the voting process and its applications to 
solve some problems will be explored. Araz and Spannowsky [111] developed an ensemble learning method 
where the ensemble learner gives feedback to each neural network to improve the representation. of the 
network hypothesis. To use the ensemble learner to modify the parameters of each individual learner is very 
promising and might bring the field to a new era in machine learning where the machine with the help of 
ensemble observations can change the method of thinking (modify internal hyperparameters) of each 
individual learner without the need of manual changing (human supervision). 

Kim et. al. [113] introduced the WAVE method for voting in ensemble learning. The method 
depends on iterative procedures that assign two different weighting vectors, one weight vector for classifiers 


Bulletin of Electr Eng & Inf, Vol. 13, No. 3, June 2024: 1897-1912 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 O 1899 


and another one for instances. The two vectors are connected in such a way that the vector of instances 
determines the weights of the classifier's weight vector. This method tried to pick the classifiers that have a 
bigger chance of picking the correct class of a given data instance. In fact, it tries to catch the winner 
classifiers. 

Kuncheva and Rodriguez [114] proposed a type of ensemble of ensembles. In their work, they used 
four different methods for combining the results of the trainers. These methods are naive Bayes, recall 
combiner, majority voting, and weighted majority voting. First, they choose one combining method and then 
generate the next combining method from the last one in a subsequent manner. This method is very heavy 
since it first needs to develop the trainers then add one ensemble method then go back to change the set of 
parameters in each trainer and hence generate the subsequent combining method. 

Zhang et al. [115] developed a voting method in ensemble learning to deal with the imbalanced data 
classification problem. The method depends on weighted majority voting. However, the weight of each 
classifier is generated as the solution of an optimization problem based on a differential evolution algorithm. 
This means that it is required to solve one evolutionary optimization problem for each weight for each 
classifier after setting the original training phase for each classifier. 

Liu and Truszczynski [110] presented a method for ensemble learning that depends on merging a set 
of small trees (partial lexicographic preference (PLP) trees). Instead of using a large tree. The tree is divided 
into a forest composed of small PLP- trees. According to their results, they proved that any combining 
method for the small trees in one ensemble learner will give a result that competes with the individual 
learners regardless of the combining method used to merge the merge the small trees. In fact, in the current 
work, we will show a similar result. However, the details show that despite different combining methods 
giving very near accuracies but depending on the combining method, one can develop a combining method 
that competes with the current voting methods and can generate several ensemble learners with near 
accuracies but have a significant diversity that might enable us to choose the most suitable combining method 
for a given application. 

Cornelio et al. [117] adopted the approach of weak learners without tunning or pruning to the 
hyperparameters then margining them in one ensemble learner can compete with any sophisticated tunned 
learner. However, the margining method can affect the accuracy of the ensemble learner not only for its 
accuracy but also in the diversity that can be gained for each different ensemble learner that one can get 
through different combining methods. achieved through different methods. Rojarath and Songpan [119] 
addressed the issue of multi-class data. They proposed a cost-sensitivity matrix of the true positive (TP). This 
matrix, in conjunction with a probability measure, was used to assign a weight for a set of heterogenous 
trainers in an ensemble learning model. It is very important that the ensemble learner model be independent 
of the type of individual learners and the type of problem at hand. In the current work, the proposed 
combining method is independent of the type of each weak learner of the problem at hand domain. 

Delgado [118] proposed, based on the confidence level CL that assigns a degree of support of each 
weak learner and bagging approach, a voting scheme. The degree of support measure depends on the 
probabilistic of the error of each individual classifier. When the number of weak leaners is odd the proposed 
ensemble voting approach can compete with the simple voting majority approach for the binary classification 
problems distribution. For multiclass problems, the degree of support depends on the error distribution of the 
classifiers and additional knowledge of the probability distribution over the classes. Restricting the number of 
classifiers to be odd and defining a different weighting method based on the number of classes present in the 
problems make this method very limited in its applicability. 

Xu et al. [130] proposed a decision model based on an ensemble learning approach through the 
following two-stage scheme. This approach adopts the dynamic weighting of the base classifiers which can 
be learned from successful decisions history. The model depends on classic weighted majority voting. 
However, the approach depends on a continuous refreshing of the weights based on historical decisions. 

Suchithra and Pai [112] used the nearest neighbour estimation and bagging ensemble method to 
propose an ensemble learner. Through implementing k-nearest label ranker, an ensemble model was 
proposed. The voting method was the voting rule selector (VRS) which was integrated with another 
traditional voting method to develop an ensemble learning model. 

The above-mentioned works either adopt a very simple voting approach (simple voting majority) or 
propose a very complicated approach that cannot be applied in many situations. Based on a set of 
considerations the proposed voting was developed. These considerations are: voting scheme is essential in 
building a proper ensemble learning model. The voting method should be simple and independent from the 
problem class and the nature of the individual learners as well. The diversity of the weak learners should be 
checked before deciding the voting method. In this work, instead of considering the weak learners as 
individual players who work individually to achieve the team goal, the voting method based on the diversity 
matrix will distribute the learners into two cooperative teams. The weights will be designed to reflect the 
cooperation between the two teams to achieve better results. Without losing the generality of the approach, 
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the method is applied in the case of binary classification. Each individual learner will be trained without any 
optimization approach to improve its hyper parameters. The proposed scheme can be applied to any set of 
homogenous or heterogenous set of learners. 


3. METHOD 

Suppose that we have a set of data D and we need to choose the best class for a given element x € D 
through choosing the best trainer from all trainers set that can be found to classify the elements of the set D. 
Bayes theorem can be used to find the best trainer topt among the set of all trainers T. topt must be the most 
probable trainer that can be used to classify elements of D. Hence according to Bayes theorem Copt is given 


by (1): 


p(P/t)xp(t) 


topt = arg max p(*/p) = arg max = arg max p(?/,) x p(t) (1) 
ter teT p(D) ter 


It is clear that we have an exhaustive search problem that might not be solved in a real-time since 
the size of the set T might be very large. The alternative solution is to try to relax the assumption from trying 
to find the best trainer among all set of trainers to find the best class of a given data element using a given set 
of trainers. Instead of designing a competition between the trainers to find the winning trainer, let them 
cooperate to find the best class for a given element from the set of tata [104]. As many trainers can be added 
as much as a better performance [111]. Weight voting was used in many works [113]-[115], [119]-[124]. 
The relative accuracy acc,(t) of a trainer t € T with respect to a class c € C can be stated as the times that t 


classified an element x € D correctly as an element from the class c divided by the total number of elements 


: |prez(xED)=c| 
in the class c acc, (t) = PENELIT 
|xc€D| 


theorem and relative accuracy, the best class C,,; can be found from the set of classes C based on the fact 
Copt Must be the most probable class given the set of all possible trainers T through (2): 


where pre;(x) is the predicted class of x by trainer t based on bayes 


Copt = sd Leer p(°/t) x Pe(t/p) (2) 


Figure 1 shows the structure of the proposed system that simulates how (2) can be implemented. We 
have a given number of trained weak learners on a given data to classify the data into a given number of 
classes. There are many methods to calculate the weights in (2). one method is the dynamic weighting 
approach to combine between the results of the weak learners to get an optimal classification. In such 
dynamic weighting, after getting the predicted class for all instances of data, the ensemble trainer can be 


built. 
Trained T; 
7 Cootimal Of X 


Figure 1. General view of ensemble learning model based on relative voting 


Suppose we have n independent weak learner. t1, tz, ..., tn and m classes c1, C2, ..., Cm. One can 
define P,, ( D) to represent the relative accuracy of t; with respect to class c; divided by the sum of relative 
j 


accuracies of all weak learners with respect to class c;. 


ace (ti) 


t; = aoe 
P, ( Jol E Pica heer. (3) 
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class of a given data x using t; is Cj, i = 1,...,n and j =1,...,m. The optimal class Coptimaı of x can be 
determined by (4): 


Coptimal (Xx) = arg eaa Leer Px( C;/ti) x P (ti/D) (4) 
j 
For entry, 
G; Lif pre(ti Xk) = C; 
j = j 
E D.p ( la) {0 if pre(ti,x~) # CG ©) 


where pre(t;, Xg) is the predicted class of x, made by t;. p(t;/D) give us an idea about which weak learner 
G; 
is correct most of the time. And p( 7 J: t.) give us a tool to sum the accuracies of all weak learner gives their 
l 


votes dynamically to the class Cj, So it is a tool of dynamic voting. In (5) represents the rough dynamic 
weighting approach. The trainer only votes for the class that this trainer assigned to the entry. 


C; 
Note that there is no direct method to calculate pe( J / eN (5) is one of many possible candidates 
l 
C; C; 
that can be used to calculate py ( J / Al The optimal values can be gotten through considering py ( J / a) as 


Ç: 
weights and the problem can be stated as finding the optimal values of pe ( J / t) that reduce the error in 


results of (4). This will formulate the ensemble problem as a neural network with one input layer and one 
output layer and with no hidden layers. Figure 2 shows the structure of such a network. 


Acc1, Classes1 


Traineri 


Ensemble 
Trainer 


Optimal 
Classes 


Unclassified 
Data 


Acc n, Classes n 


Trainer n 


Figure 2. Ensemble learning as simple neural network containing one node 


C; 
Also, it can be stated as an optimization problem. Find the weights pe( 4 ye t) that minimize the error or 
maximize the gain (accuracy) of the learner expressed in (4) and (5). 
C; 
In the following, two examples will be provided how the proposed values for py ( 4 / t) in (5) might 


solve problem of classic majority voting or its variations that cannot solved based on the classical majority 
voting method. However, there is no theoretical guarantee that enables us to claim that the proposed 
ensemble method based on (4) and (5) can give better performance than the classic majority voting method. 
As we will see in the results. The proposed method can give at least the same results as the classic majority 
methods. In the results, we will compare the results of the majority voting and variations of the proposed 
method. 

Example 1: suppose we have 6 weak learners t4, tz ,..., tg. Table 1 shows the values of using these 6 
weak learners to classify a give data into two classes ® and ©. Table 1 summarizes the calculation of 


: C; 
Pej foji = 1,2,..,7 and j = 1,2. And p( Iahi = 1,2,...,7 for a given item x. Since 3 trainers 


classified x as ® and another 3 trainers classified x as ©, hence the classical voting will fail to assign the 
correct class of x. The optimal class of x can be calculated based on Bays naive as follows: 
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Table 1. Dummy example | to show the validity of dynamic relative voting model 
Learner Accg Acco Pe Cy) Po C/p) Px (a) Px (Phi ® Voting © Voting 


wl, 0.90 0.70 0.18 0.15 1 0 0.18 0.00 
wl, 0.70 0.85 0.14 0.18 1 0 0.14 0.00 
wl, 0.90 0.70 0.18 0.15 1 0 0.18 0.00 
wl, 0.80 0.80 0.16 0.17 0 1 0.00 0.17 
wl; 0.70 0.90 0.14 0.19 0 1 0.00 0.19 
wle 0.90 0.80 0.18 0.17 0 1 0.00 0.17 

Ensemble decision 0.51 0.53 


Zeer P(/) X Pe (2/5) =0.51 x rer p/d X Po (©/p)=0.53, then, the ensemble 


prediction xe ©. 
Example 2: this example shows that, the minority can gain a greater value than the majority and can 
decide the correct class. Table 2 shows the values of using these 7 weak learners to classify a give data into 


two classes @ and ©. Table 2 summarizes the calculation of Pej (oli = 1,2,...,7 and j = 1,2. And 


CG; 
p ( J j: t) ,t = 1,2, ...,7 for a given item x. Since 3 trainers classified x as ® and another 4 trainers classified 
x as O, hence the classical voting will assign the class of x as ©. While the optimal class of x based on the 


proposed method can be calculated as YeerP(/,) xX Po (®/p) = 0.53 < Vier p/p x po (9/p)=0.51. 


Then, the ensemble prediction xe ®. Dynamic voting based on the prediction of each trainer is assumed to 
give better results than the results of each individual trainer as well as better than other ensemble learning 


methods. In the following section we will explore the results of real experiments. Assigning p, (2 n) to be 
U 


zero or one is called rough relative majority. If we relaxed this value to be a number between zero and one, 
we might get a better result. Using a unified value for all trainers is called fixed relaxed relative majority 
voting. 


Table 2. Dummy example 2 to show the validity of dynamic relative voting model 
Learner Acce Acce Po(“/p) PoC“/p) Px (2%) Px (84) @ Voting © Voting 


tı 0.90 0.60 0.18 0.13 1 0 0.18 0.00 
tz 0.90 0.60 0.18 0.13 1 0 0.18 0.00 
t 0.80 0.60 0.16 0.13 1 0 0.16 0.00 
ty 0.80 0.80 0.16 0.17 0 1 0.00 0.17 
tl, 0.70 0.80 0.14 0.17 0 1 0.00 0.17 
te 0.90 0.80 0.18 0.17 (0) 1 0.00 0.17 
ty 0.60 0.80 0.12 0.17 0 1 0.00 0.17 

Ensemble decision 0.53 0.51 


In the experiments, many variations of (4) and (5) will be tested. The concept of diversity will be 
used to propose a different method to look at the ensemble learning and the voting process. Based on the 
diversity concept and the conflict trainer the trainers will be divided into 2 teams. Instead of assigning 
weights for each trainer two weights can be assigned for each team. The goal of the two teams s to assign the 
best class for a given input. To define the diversity matrix, assume that for a given data x a trainer t, assigned 
the class C, for x which we will assign it the value 6, and a trainer t, assigned the class C, for x which we 
will assign it the value 8, then then the sum of the absolute difference |0; (x) — 62(x)| for each element x in 
the data set. In (6) define the entry in the diversity matrix for each two trainers. 


div(t;, tk) = Yxen|;(x) — Ox(x)| (6) 


For each trainer, the conflict trainer is defined as the trainer that gives the maximum difference in 
the column of the trainer in the diversity matrix. Successive deletion and iteration will be used to define the 
pairs of conflicting trainers. First, locate the maximum value in the diversity matrix, this assigns the first two 
conflict trainers, The column and rows of those two trainers are omitted from the diversity matrix. The 
process is repeated for the resulting diversity matrix till we get an empty diversity matrix or a diversity 
matrix with one trainer. This unique trainer is called (if any) neutral trainer and it will be omitted from the 
right to vote. Figure 3 explains the repeated process of reducing the diversity matrix for 5 trainers. 
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Delete Delete 
Delete 
Delete 
t4 


tı | t2 | t3 ts 
(ta tz) (tz, ts) 
Delete = ti 
Most Conflicting Pairs 4s 
Most Conflicting Pairs 
Delete 


Neutral 


Delete Delete 
Max value 


Figure 3. Using diversity matrix phase 1 in generating diversity teams (conflicting pairs) 


Now, we have determined the pairs of most conflicting trainers. Suppose (t;,t,) and (t;, tı) are 
conflicting pairs. This means that t; is conflicting to t, and t; is conflicting to t;. Then what is the relation 
between tx and t; or t; and t,. From the original diversity matrix, the two trainers with minimal diversity will 
be added in one set. This process can be repeated to get two sets of diversity teams. A dynamic relaxed 
voting method can be proposed based on the concept of diversity teams. The dynamic relaxed relative 
majority voting depends heavily on the sets of conflictions. Figure 4 explains how to get such conflictions 
sets. 


{to, t4} Teamt 


t, | | 
tı | | 
|| 


Minumum 
t3 
t4 Sy 
ts | {t3, ts} Team2 
Minumum 


Figure 4. Using diversity matrix phase 2 in generating diversity teams (cooperative members) 


MEDENE 


In this work we can define 4 different types of voting strategies. In (4) and (5) represent the relative 
voting strategy. If the relative accuracy is replaced by the absolute accuracy of each trainer, one gets the 
absolute relative voting strategy. Depending on the concept of diversity matrix and the diversity teams, one 


C; 
can get many of voting strategies where we propose a value « between [0,1] to represent pe ( 2 % t) for the 


: Cj : : 
first team and use 1—« to represent py ( 7 / t) for the other team. A number of trials can be done to find the 
l 


best value of «. In the experiments, 3 values were determined to give the highest accuracies. The 
corresponding ensemble trainers were called R1, R2, and R3. 

In (7) to (9) correspond to relative absolute dynamic voting, where the relative accuracy for each 
trainer is replaced by the absolute accuracy Acc;,(¢;/D). 


Coptimai(Xk) = areas Lejer Px (Cj /ti) * Pr;(ti/D) (7) 
j 
For entry, 
C Lif pre(ti xk) = C; 
j = j 
me D. px ( la) : if pre(ti Xy) # Cj (8) 


P,("/p) = ae (9) 


~ YR acc(ty) ’ 


C; 
The relaxed voting strategy rules can be given through replacing P,( 7 f: t) by the relaxing parameter 
x for one team and 1—« for the other team in (5). The majority voting rule is given by (11) and (12): 
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Contimal (xk) = arg Heni ditjer Ne, (t;(x)) (11) 
j 


Lif ti) =G 
Ac, (ti(x)) = (12) 
0 otherwise 


4. RESULTS AND DISCUSSION 

All related experiments and data can be accessed through the following link. The system consists of 
three types of weak trainers. The categories of weak trainers are as follows. Ordinary support vector machine 
(SVMT) trainer, decision trees trainer (DTT) and a set of deep learning trainers. The deep learning trainers 
are two dense connected traiers, one with one hot embedding (DSH) and one with embedding layer (DSE). 
Based on variations Recurrent net, two different trainers are built, GRNT and LSTMT. Finally, based on 
conventional nets a one CONVD1 was built (CONVDIT). So, the total number of weak trainers used in this 
work is seven. Figure 5 shows the architecture of the deep learning trainers. Figure 5(a) shows the 
architecture of DSH trainer structure. Figure 5(b) shows the architecture of DSE trainer. Figure 5(c) shows 
the architecture of GRN trainer. Figure 5(d) shows the architecture of LSTM trainer. Figure 5(e) shows the 
architecture of CONVID trainer. 


Type: dense_15 (Dense) 
Output Shape: (None, 1) 
Parameters #: 3201 


Type: embedding_15 (Embedding) Type: flatten_15 (Flatten) 
Output Shape: (None, 400,8) Output Shape: (None, 3200) 
Parameters #: 80000 Parameters #: 0 


(a) 


Type: dense_42 (Dense) 
Output Shape: (None, 1) 
Parameters #: 65 


Type: dense_84 (Dense) 
Output Shape: (None, 1) 
Parameters #: 33 


Type: dense_&3 (Dense) 
Output Shape: (None, 32) 
Parameters #: 16416 


(b) 


Type: dense_82 (Dense) 
Output Shape: (None, 512) 
Parameters #: 5120512 


Type: dense_42 (Dense) 
Output Shape: (None, 1) 
Parameters #: 65 


Type:embedding_45 (Embedding) Type: bidirectional (Bidirectional) Typezembedding_44 (Embedding) Type: bidirectional (Bidirectional) 
Output Shape: (None, None.32) Output Shape: (None, 64) Output Shape: (None, None,32) Output Shape: (None, 64) 


Parameters #: 320000 Parameters #: 16640 Parameters #: 320000 Parameters #: 16640 


(c) (d) 


Type:convld_15 (Conv1D) Type:convld_16 (Conv1D) 
Output Shape: (None, 494, 32) Output Shape: (None, 158, 32) 
Parameters #: 28704 Parameters =: 7200 


Type:gru_1 (GRU) 
Output Shape: (None, 32) 
s = 6336 


HON 
ORA 


SY © 
NRT RX 
/ Type: dense_6 (Dense) 
$ s Output Shape: (None, 1) 
OS C3 (X= Parameters #: 33 


Type $ ;_poolingld_9(MaxPooling1 D) 
Output Shape: (None, 164, 32) 
Parameters #: 0 


(e) 


Type: embedding_2 (Embedding) 
Output Shape: (None, 500, 128) 
Parameters #: 1280000 


Figure 5. Neural networks trainers structure; (a) DSH trainer structure, (b) DSE trainer structure, (c) GRN 
trainer structure, (d) LSTM trainer structure, and (e) CONVID trainer structure 
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All trainers are trained using internet movie database (IMDB) dataset. IMDB consists of 50,000 
reviews from the IMDB. IMDB has 50% negative reviews and 50% positive reviews. The set of 50000 
reviews was divided into 25000 reviews for training and 25000 reviews for testing. Figure 6 shows the 
training and validation results for loss and accuracy of some trainers. Most of the trainers tend to get higher 
overfitting after few epochs. The results of testing accuracy for all trainers are listed shown in Table 3. The 
highest testing accuracy was achieved by DSE trainer, and the lowest testing accuracy was achieved by DTT 


trainer. 
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Figure 6. Results of absolute accuracy and loss functions during the training of the individual trainers 


Table 3. Relative accuracies of the individual trainers with respect to the negative and positive classes 


Trainer Accuracy Negative_acc (©) _ Positive_acc(®) 
DSH 0.87348 0.87872 0.86824 
DSE 0.87576 0.87496 0.87656 
GRNT 0.85604 0.8196 0.89248 

LSTMT 0.85116 0.91736 0.78496 

CONVDIT 0.77824 0.77152 0.78496 
DTT 0.70384 0.69928 0.7084 

SVMT 0.83408 0.84336 0.8248 
Min 0.70384 0.69928 0.7084 
Max 0.87576 0.91736 0.89248 


Table 3 shows also the relative testing accuracies as well as the absolute accuracy of each trainer 
with respect to each type of review. Since there are two classes, one of them will be called the negative class 
and the other will be called the positive class. © stands for negative reviews (class) and © stands for 
positive reviews (class). Table 3 shows that the lowest accuracies came from the decision tree DTT trainer 
while the best positive accuracy achieved by GRNT trainer, and the best negative accuracy achieved by 
LSTMT trainer. This was expected since it is known that the best deep trainer that can handle time series or 
text streams is the recurrent nets. 

To test the proposed approach, we will compare the accuracy of the proposed approach in contrast 
of the accuracy of the traditional majority voting and a relaxed version of the proposed approach where we 
will replace the relative accuracy with the absolute accuracy of each trainer. Figure 4 shows the details of the 
algorithm to apply the majority voting, relative accuracy approach and the absolute accuracy approach. 
Table 4 represents the div function (diversity matrix) for all trainers. The min difference between DSH and 
SVMT, and the max difference between DTT and CONVDIT. 


Č 
The experiments were designed to the approach using variety methods to calculate p( J 7 a The 


minimum difference between trainers was between DSH trainer and SVMT trainer and the maximum 
difference between trainers was between DTT trainer and CONVDIT trainer. This note will be used as an 


C 
indicator to design values of p ( J fe t): 
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Table 4. Initial diversity matrix of the individual trainers 


Trainer DSH DSE GRNT LSTMT _ CONVDIT DTT SVMT 
DSH 0 2718.429 3161.526 3518.903 9044.522 7432.308 2622.949 
DSE 2718.429 0 2702.554 2668.054 8495.66 7663.706 4125.81 

GRNT 3161.526 2702.554 0 3271.968 8924.985 7612.585 4230.079 

LSTMT 3518.903 2668.054 3271.968 0 8908.16 7972.727 4103.086 

CONVDIT 9044.522 8495.66 8924.985 8908.16 0 12584.69 10260.91 
DTT 7432.308 7663.706 7612.585 7972.727 12584.69 0 7546 
SVMT 2622.949 4125.81 4230.079 _4703.086 10260.91 7546 0 


The minimum value of p (© / t) will be 1-12584/25000=0.4967 and the maximum value will be 


1-2622/25000=0.896. In the experiments, these values will be relaxed to be values between 0.5 and 1.0. Also, 
for each trainer the most conflict trainer is be defined as follows: the most conflict for a trainer t; is the 
trainer t; that give the maximum difference in the column of t;. Table 5 shows the conflict trainers pairs. 


Table 5. The diversity teams resulting from applying the diversity conflicting algorithm on the diversity 


matrix 
Trainer Conflict trainer 
CONVDIT DTT 
LSTMT SVMT 
GRNT DSH 


In the experiments, there is one trainer (DSE trainer). This trainer is called the neutral trainer and it 
was assigned the weight zero for p( J j: a) in the experiments. This means that it will be omitted from the 
U 


voting process. Table 6 summarizes the results of all experiments. The first experiment tests the traditional 
majority voting method. The second tests the rough relative majority method and the last one tests a relaxed 
version of the rough relative majority with unified values for all trainers 0.9. Figure 7 represents the results of 
testing the relaxed relative majority based on the conflict concept. 


Table 6. Results of relative accuracies of the optimal values of the relaxing parameter « 


x ne_acc_ens po_acc_ens _acc_ens 
0.528 0.88048 0.88496 0.88272 
0.504 0.88056 0.88456 0.88256 
0.564 0.87912 0.88512 0.88212 
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Figure 7. Relative accuracies of the individual trainers with respect to the negative and positive classes 


Figure 8 shows the results of relaxed approach using a range of values between [0,1] for the 
relaxation parameter x. The worst performance was resulting from values of the relaxed parameter « below 
than 0.2, however, it is still presenting a competing performance to the individual traines especially for 
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values near 0.2. The values between 0.3 and 0.4 of the relaxation parameters « gives identical performance 
of the three different types of accuracies. Also, the performance is still better than the individual trainers and 
compete with other ensemble methods. So, to get a balanced performance high performance, keep the value 
of the relaxing parameter « between 0.3 and 0.4. It is clear that the best value of the relaxation parameter is 
near 0.5. Based on the experimental results, there is 3 values of « that give the best performance. So, these 3 
values are considered to give 3 different relaxed ensemble voting methods. For values greater than 0.5, the 
performance is still fine and gives acceptable results, however these results are not stable and also are not 
optimal. This analysis leads us to consider only the values of relaxed parameters that are at the top of each 
accuracy. 
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Figure 8. Relaxed ensemble results for different values of the relaxing « in the interval [0,1] 


The results confirm the known facts about ensemble learning. Any ensemble learning, regardless of 
the voting method, is superior for any induvial learner [129]. Also, the experiments proved that there is a 
limit of the superiority of the ensemble learner that can reach. In the above experiments, it was 90%. The 
accuracy limit depends on the accuracies of the individual learners. Table 7 shows the diversity matrix of the 
used voting strategies. It is noted that the rough relative voting strategy, absolute rough voting strategy and 
the majority voting strategy give identical results since the diversity difference between all of them is zero. 
However, considering the relaxing parameter « with optimal values not only producing a competing result 
but also gives different diversity. This diversity is required, especially when different results from ordinary 
voting methods are required. 


Table 7. Diversity matrix for ensemble models 
Relative Majority accuracy of Relative absolute RI R2 R3 


accuracy ensemble model accuracy 
Relative accuracy 0 0 0 0 0 0 
Majority accuracy of ensemble model 0 0 0 0 0 0 
Relative absolute accuracy 0 0 0 0 0 0 
R1 634 634 634 0 0 0 
R2 690 690 690 110 0 0 
R3 587 587 587 163 273 0 


5. CONCLUSION 

In this work, a revision of the ensemble learning methods and its applications in different fields was 
presented. A deep look at the voting strategies was explored. This work focused on a new method for 
combining the results of individual learners. The diversity concept was practically defined and based on this 
definition a proposed voting method was presented. The results of the experiments show that any voting 
strategy will lead to an ensemble learner that is superior to any individual learner. The diversity matrix of 
different ensemble learners shows that all ordinary voting strategies will lead to identical ensemble learners. 
However, the proposed relaxed voting method leads to real different ensemble learners that give different 
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diversity from other ensemble learners based on the different values of the relaxing parameter or different 
voting strategy. 
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